It is well established in neuroscience that color vision plays an essential part in the human visual perception system. Meanwhile, many novel designs for computer vision inspired by human vision have achieved success in a wide range of tasks and applications. Nonetheless, how color differences affect machine vision has not been well explored. Our work tries to bridge this gap between the human color vision aspect of visual recognition and that of the machine. To achieve this, we curate two datasets: CIFAR10-F and CIFAR100-F, which are based on the foreground colors of the popular CIFAR datasets. Together with CIFAR10-B and CIFAR100-B, the existing counterpart datasets with information on the background colors of CIFAR test sets, we assign each image based on its color contrast level per its foreground and background color labels and use this as a proxy to study how color contrast affects machine vision. We first conduct a proof-of-concept study, showing the effect of color difference and validate our datasets. Furthermore, on a broader level, an important characteristic of human vision is its robustness against ambient changes; therefore, drawing inspirations from ophthalmology and the robustness literature, we analogize contrast sensitivity from the human visual aspect to machine vision and complement the current robustness study using corrupted images with our CIFAR-CoCo datasets. In summary, motivated by neuroscience and equipped with the datasets we curate, we devise a new framework in two dimensions to perform extensive analyses on the effect of color contrast and corrupted images: (1) model architecture, (2) model size, to measure the perception ability of machine vision beyond total accuracy. We also explore how task complexity and data augmentation play a role in this setup. Our results call attention to new evaluation approaches for human-like machine perception.
translated by 谷歌翻译
手语制作(SLP)旨在将语言的表达方式转化为手语的相应语言,例如基于骨架的标志姿势或视频。现有的SLP型号是自动回旋(AR)或非自动入口(NAR)。但是,AR-SLP模型在解码过程中遭受了回归对均值和误差传播的影响。 NSLP-G是一种基于NAR的模型,在某种程度上解决了这些问题,但会带来其他问题。例如,它不考虑目标符号长度,并且会遭受虚假解码启动的影响。我们通过知识蒸馏(KD)提出了一种新型的NAR-SLP模型,以解决这些问题。首先,我们设计一个长度调节器来预测生成的符号姿势序列的末端。然后,我们采用KD,该KD从预训练的姿势编码器中提取空间语言特征以减轻虚假解码的启动。广泛的实验表明,所提出的方法在特里切特的手势距离和背面翻译评估中都显着优于现有的SLP模型。
translated by 谷歌翻译
域适应(DA)最近在医学影像社区提出了强烈的兴趣。虽然已经提出了大量DA技术进行了用于图像分割,但大多数这些技术已经在私有数据集或小公共可用数据集上验证。此外,这些数据集主要解决了单级问题。为了解决这些限制,与第24届医学图像计算和计算机辅助干预(Miccai 2021)结合第24届国际会议组织交叉模态域适应(Crossmoda)挑战。 Crossmoda是无监督跨型号DA的第一个大型和多级基准。挑战的目标是分割参与前庭施瓦新瘤(VS)的后续和治疗规划的两个关键脑结构:VS和Cochleas。目前,使用对比度增强的T1(CET1)MRI进行VS患者的诊断和监测。然而,使用诸如高分辨率T2(HRT2)MRI的非对比度序列越来越感兴趣。因此,我们创建了一个无人监督的跨模型分段基准。训练集提供注释CET1(n = 105)和未配对的非注释的HRT2(n = 105)。目的是在测试集中提供的HRT2上自动对HRT2进行单侧VS和双侧耳蜗分割(n = 137)。共有16支球队提交了评估阶段的算法。顶级履行团队达成的表现水平非常高(最佳中位数骰子 - vs:88.4%; Cochleas:85.7%)并接近完全监督(中位数骰子 - vs:92.5%;耳蜗:87.7%)。所有顶级执行方法都使用图像到图像转换方法将源域图像转换为伪目标域图像。然后使用这些生成的图像和为源图像提供的手动注释进行培训分割网络。
translated by 谷歌翻译
Semantic communication (SemCom) and edge computing are two disruptive solutions to address emerging requirements of huge data communication, bandwidth efficiency and low latency data processing in Metaverse. However, edge computing resources are often provided by computing service providers and thus it is essential to design appealingly incentive mechanisms for the provision of limited resources. Deep learning (DL)- based auction has recently proposed as an incentive mechanism that maximizes the revenue while holding important economic properties, i.e., individual rationality and incentive compatibility. Therefore, in this work, we introduce the design of the DLbased auction for the computing resource allocation in SemComenabled Metaverse. First, we briefly introduce the fundamentals and challenges of Metaverse. Second, we present the preliminaries of SemCom and edge computing. Third, we review various incentive mechanisms for edge computing resource trading. Fourth, we present the design of the DL-based auction for edge resource allocation in SemCom-enabled Metaverse. Simulation results demonstrate that the DL-based auction improves the revenue while nearly satisfying the individual rationality and incentive compatibility constraints.
translated by 谷歌翻译
Edge-assisted vehicle-to-everything (V2X) motion planning is an emerging paradigm to achieve safe and efficient autonomous driving, since it leverages the global position information shared among multiple vehicles. However, due to the imperfect channel state information (CSI), the position information of vehicles may become outdated and inaccurate. Conventional methods ignoring the communication delays could severely jeopardize driving safety. To fill this gap, this paper proposes a robust V2X motion planning policy that adapts between competitive driving under a low communication delay and conservative driving under a high communication delay, and guarantees small communication delays at key waypoints via power control. This is achieved by integrating the vehicle mobility and communication delay models and solving a joint design of motion planning and power control problem via the block coordinate descent framework. Simulation results show that the proposed driving policy achieves the smallest collision ratio compared with other benchmark policies.
translated by 谷歌翻译
最近的研究提出了一系列针对深度任务模型的专业优化算法。通常声称这些多任务优化(MTO)方法产生的解决方案优于仅通过优化任务损失的加权平均值而获得的解决方案。在本文中,我们对各种语言和视觉任务进行大规模实验,以检查这些主张的经验有效性。我们表明,尽管这些算法的设计和计算复杂性增加了,但MTO方法并未产生超出传统优化方法可实现的性能的任何改进。我们强调了替代策略,这些策略始终如一地提高性能概况,并指出可能导致次优效果的常见训练陷阱。最后,我们概述了可靠地评估MTO算法的性能并讨论潜在解决方案的挑战。
translated by 谷歌翻译
可重新配置的智能表面(RIS)可以显着增强TERA-HERTZ大量多输入多输出(MIMO)通信系统的服务覆盖范围。但是,获得有限的飞行员和反馈信号开销的准确高维通道状态信息(CSI)具有挑战性,从而严重降低了常规空间分裂多次访问的性能。为了提高针对CSI缺陷的鲁棒性,本文提出了针对RIS辅助TERA-HERTZ多用户MIMO系统的基于深度学习的(DL)基于速率的多访问(RSMA)方案。具体而言,我们首先提出了基于DL的混合数据模型驱动的RSMA预编码方案,包括RIS的被动预编码以及模拟主动编码和基本站(BS)的RSMA数字活动预码。为了实现RIS的被动预码,我们提出了一个基于变压器的数据驱动的RIS反射网络(RRN)。至于BS的模拟主动编码,我们提出了一个基于匹配器的模拟预编码方案,因为BS和RIS采用了Los-Mimo天线阵列结构。至于BS的RSMA数字活动预码,我们提出了一个低复杂性近似加权的最小均方误差(AWMMSE)数字编码方案。此外,为了更好地编码性能以及较低的计算复杂性,模型驱动的深层展开的主动编码网络(DFAPN)也是通过将所提出的AWMMSE方案与DL相结合的。然后,为了在BS处获得准确的CSI,以实现提高光谱效率的RSMA预编码方案,我们提出了一个CSI采集网络(CAN),具有低飞行员和反馈信号开销,下行链接飞行员的传输,CSI在此处使用CSI的CSI反馈。 (UES)和BS处的CSI重建被建模为基于变压器的端到端神经网络。
translated by 谷歌翻译
本文提出了一种简单,准确且计算上有效的方法,以将欧几里得空间中开发的普通无气体滤波器应用于在流形上发展的系统。我们使用称为稳定嵌入的数学理论来使无味的Kalman滤波器保持状态估计,以保持状态估计值在表现出色的估计性能的同时,与歧管近距离近距离。我们通过将其应用于卫星系统模型并将其与其他专门针对歧管上系统设计的非意识到的卡尔曼过滤器进行比较,确认了我们设计的过滤器的性能。我们设计的过滤器的估计误差很低,可以使状态估计与预期的歧管密切相邻,并消耗少量的计算时间。同样,我们设计的过滤器非常简单易用,因为我们的过滤器直接采用了在欧几里得空间中设计的现成的标准的无气味卡尔曼滤波器,而没有任何特定的歧管结构构造的离散方法或坐标转换。
translated by 谷歌翻译
集成感应和通信(ISAC)代表范式转移,以前竞争的无线传输是共同设计的,可通过共同使用硬件平台来提高光谱,能源和硬件效率来和谐地运行。但是,由于诸如褪色和堵塞之类的对抗性因素,ISAC无融合可能会遭受高感知不确定性的影响。本文提出了一个多点ISAC(MPISAC)系统,该系统通过利用多雷达数据冗余来融合来自多个ISAC设备的输出,以实现更高的感应性能。此外,我们建议通过功能选择模块有效地探索传感和通信之间的性能权衡,该功能选择模块可适应地确定ISAC设备的工作状态(即传感或通信)。我们方法的症结在于采用融合模型,该模型通过假设检验和最佳投票分析来预测融合精度。仿真结果表明,MPISAC优于各种基准方案,并表明所提出的方法可以有效地跨越ISAC系统中的权衡区域。
translated by 谷歌翻译
准确地估算主要山区盆地中的积雪对于水资源经理来说至关重要,以便做出影响当地和全球经济,野生动植物和公共政策的决策。目前,此估计需要多个配备LIDAR的飞机飞行或原位测量值,两者均昂贵,稀疏和对可访问区域有偏见。在本文中,我们证明了来自多个,公开可用的卫星和天气数据源的空间和时间信息的融合,可以估算关键山区的积雪。我们的多源模型的表现优于单源估计值5.0英寸RMSE,并且优于稀疏的原位测量值的估计值1.2英寸RMSE。
translated by 谷歌翻译